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Abstract 

In classical probability theory, the best predictor of a future observation of a random variable X, 
is its expected value i?p[X] when no other information is available When information consisting in 
the observation of another random variable Y is available, then the best predictor of X is another 
random variable -Ep[X|y]. It is the purpose of this note to explore the analogue of this in the case 
of quantum mechanics. We shall see that exactly as in classical prediction theory, when the result 
of an observation is taken into account by means of a non-commutative conditional expectation, 
some of the usual paradoxes cease to be such. 
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I. INTRODUCTION AND PRELIMINARIES 



In this section we recall the difference between measurement and prediction, and we 
describe what does classical prediction consist of, and finally we recall the basics about non- 
commutative conditional expectation necessary to present a quantum analogue to classical 
prediction theory. 

A. Prediction versus measurement 

Non differentiating between these concepts may be a source of confusion. Consider the 
following examples. Suppose you own some stock and want to decide whether to sell today 
or wait until tomorrow, or you want to answer one of the following the questions: What is 
the outcome of the toss of a fair die? What will the position and velocity of a particle at 
some future time be, if it moves under the action of a given force and we know its position 
and velocity now? 

One possibility is to wait till tomorrow, and check the financial pages to find the value 
of your stock, or to toss the die and observe the outcome, or to measure the position at the 
specified time. That is, you may answer your question by means of a measurement. 

But if you want to decide on a future course of action, you have to proceed differently. 
In the case of the particle, you may solve Newton's equations of motion and from the 
information available now, make your prediction. In the other two cases, things are a bit 
more complicated: the relationship between the today's data to tomorrow's data is of a 
probabilistic nature. One thing you may do is to specify the probabilities of the possible 
results and leave it at that. Or you may make a prediction. This consists of three steps: 
choosing a random variable (that is usually specified in the question to be answered). Choose 
a predictor, compute it and specify a measure of error in your prediction. 

The simplest predictor that is usually used used is the mean value: If X denotes the 
random variable, the simplest predictor of the next outcome of X is the expected 

value of X, and the simplest measure of errors are quantities of the type P{\X — E[X]\ > e) 
or E[(X — i?[X]) 2 ]. The quantity e is the error with which the experimenter or decision 
maker feels confident with. It is important to keep in mind that a prediction is something 
made with "pencil and paper," that is it does not involve measuring, regardless of the fact 
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that is uses statistical data. It is akin to guessing the result of the next measurement. To 
beat it further, notice that the expected value does not need to be one of the possible results 
of a measurement, like the example of the die shows. 

Keeping this distinction between prediction and measurement is important when inter- 
preting quantities like < ip, Aip > . For example, if A is an operator describing an observable, 
then for a given ip, (A— < ip, Aip >) 2 is another operator describing an observable. The sim- 
plest predictor of the next measurement of that observable is cr 2 (A) =< ip, (A— < ip, Aip > 
) 2 ip >, which should be interpreted as the (expected) measurement error of A in the state 

Notice as well that, if A,B,C are such that [A, B] = iC, then it is a well known result 
that a(A)a(B) > \\ < ip,Cip > |, which is to be interpreted as: for any state tp the 
predicted errors in the measurement of A, B, C are such that the inequality is satisfied, or 
there exists no ip for which the inequality is violated. But keep in mind that is an inequality 
about standard deviations or predicted measurement errors, not an inequality about results 
of actual measurements. Thus keeping in mind the distinction between measurement and 
prediction is essential. 

B. A remainder about prediction in classical science 

In classical mechanics in particular and in many mathematical models, the issue of predic- 
tion appears in essentially two different ways. On one hand we have systems, like in classical 
mechanics, whose dynamics, be it regular or irregular is deterministic. On the other hand we 
have systems, whose dynamics may be deterministic, but what it determines are transition 
probabilities, that is, if you know the probabilities of occurrence of the different states at 
t = and the ingredients of the Chapman-Kolmogorov equation, you may determine exactly 
the occupation probabilities at any later time. 

For regular deterministic systems life is easy in principle, for statistical analysis is usually 
necessary only to deal with indeterminacy in initial data. When a system is deterministic 
but irregular, only short term prediction is feasible as for regular systems. For long term 
prediction we can recur to statistical prediction in the event the system is chaotic. 

Prediction theory is a pretty much developed theory for classical (as opposed to quan- 
tum) stochastic systems. When dealing with systems for which there are no dynamics, like 
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predicting the winner of an election, we recur to probabilistic modeling combined with sta- 
tistical analysis of available data. It is here (and for classical stochastic systems) where the 
notion of conditional expectation has proved to be a key idea. Random variables are mod- 
eled as measurable functions on a "probability space" (Q,J-,F), where the "sample space" 
Q is just a set, the questions we convene to ask about the system (namely the available 
information about the system) are the elements of a cr-algebra T of subsets of fl, and P is a 
measure on JF, assigning total mass 1 to Q. Given an integrable random variable X, without 
any further information, the best prediction that we can make about a future observation 
of that variable is £P[X] = J XdF, the expected value of X with respect to P. 

Also, when X is integrable, and we observe another random variable Y, we can ask, what 
is the best prediction about X given that Y has been observed. This best predictor happens 
to be another random variable, which in all cases of practical interest is a function of Y, 
that is denoted by i?[X|F] and the following property that explains why it is the "best 
predictor": E[X|F] realizes ini E[(X — 0(F)) 2 ] over all measurable, bounded functions 0. 
The formal definition of -E[X| Y] is that it is the unique function of Y such that E[Xg(Y)} = 
E[E[X\Y]g{Y)} for any bounded measurable function g. 

That this predictor is a random variable, which is a function of Y, means that, for 
example, if Y is discrete and takes values {yi,y2, •••}, then, when Y is observed to assume 
the value yj, then the value of EF[X\Y] is EF[X\Y = Vj ] = j X(u)dP(u\Y = Vj ) with 
probability P(Y = yj). For the sake of comparison with the quantum case, in this case 
EP[X|F] can be represented as 

ep[x\y] = $>p[x|y = yj ]i {Y =y jh (i) 
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where for any event A, the indicator function I A is a dichotomic random variable taking 
values 1 or according to whether uj G A or not. 

The properties of _E[X|Y] can be read in almost any probability book, and their use to 
analyze the classical analogues of some of the standard quantum paradoxes is carried out in 
Gzyl (2004), where the basic measure theoretic concepts are recalled. 

A very simple example reminiscent of some quantum paradoxes, goes as follows. Consider 
two binary random variables X and Y taking values ±1, and note that 

P(X = -1 | X + Y = 0; Y = 1) = 1. 
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The reader should supply the proof. The interpretation is obvious: Given that we are in 
state of total spin = (i.e., given the event X + Y = occurs), and that Y — 1, then our 
prediction is that X = — 1 occurs with probability equal to 1. We do not have to measure 
it, nor there is anything propagated between X and Y. For comparison with the results 
for the non- commutative case, note that in general P(A\B,C) = Pc(A\B) where Pq is the 
original probability conditioned upon the occurrence of C, i.e., for any event A we have 
Pc{A) = P(A\C). In our example C = {X + Y = 0}, and also P c is carried by C in the 
sense that Pc(A) = whenever P(A D C) — 0. Having said this, we add that another way 
to obtain the predicted value _EP[X|F = yj] of X when Y = yj is observed, is to compute 

E Pl [X} = E ¥2 [E P [X\Y]] 

where Pj is defined to be the conditional probability P(«|Y = yj). This has a quantum 
counterpart as we shall mention below. 

C. Conditional expectations in quantum mechanics 

Since the subject is rather technical, it is not surprising that it does not show up in 
(advanced) introductory books like the one by Peres (1993), nor at serious divulgative books 
like Accardi's (1997), Selleri's (1990) or Ghirardi's (2005); but it is not even mentioned in a 
nice advanced book like Landsman's (1998). We should also remark that two classics books 
on quantum estimation, like the volumes by Helstrom (1976) or Holevo (1982), do not 
even mention the possibility of using non- commutative conditional probabilities to develop 
the quantum analogue of classical prediction. The theme is not considered either in the 
monograph on quantum measurement theory by Bush et al (1991). The material below is 
taken form Gudder and Marchand's (1972), where references to the basic literature can be 
seen. But see also Gudder's (1979) where the comparison with classical probability theory 
is examined. 

Our setup will be standard: A separable Hilbert space H is chosen to describe a particular 
system, we shall consider the von Neumann algebra A of all bounded operators on 7i and Va 
will denote the class of all self-adjoint projections on A. Two related concepts are contained 
in the following definitions. A measure is a mapping w : Va — > [0, oo) such that (i) w(0) = 0, 
and (ii) wi^Aj) = ^2w(Aj) for any countable collection of mutually orthogonal {A,} in 
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Va- A linear functional w : A — > C is an integral if it satisfies (i) and (ii) above and also (iii) 
w(A) > if A is a positive element in A. When ty(7) = 1 w is called a state. 

Let it; (A) = tr(WA), where W is a self-adjoint, positive operator such that tr(W) = 1. 
The operator W is called the density operator of w. Let us now recall 

Definition 1.1 With the notations introduced above, letM C A be a sub-algebra of A. TheM- 
expectation of A G A is an operator E w [A |B] satisfying w(BE w [A\M]B) = w(BAB) = u>a(.B) 
/or a// B eV M - 

Comment 1.1 According to the standard quantum mechanical formalism, after a measure- 
ment of B G "Pa; £/ie state becomes W = BWB/w{B), hence the expected value of A 
in this state is w(A\B) = w(BAB)/w(B) = tr(BWBA)/tr(WB) = tr(WA) = w(A). This 
certainly is the quantum analogue of the comment made at the end of section (1.2) 

Perhaps an interesting name for B is the measurement algebra. This is the analogue of 
the classical a-algebra cr(Y) determined by the observation of a random variable Y. 

The result that allows us to think of conditional expectations as predictors is the following 

Theorem 1.1 Assume that the sub algebra M is such that 

E W [AC\B] = E W [A\M]C and E W [CA\B] = CE W [A\M] 

for any C G B, then E W [A\E\ is the best predictor of A by an element o/B, 

Comment 1.2 It is easy to verify that when there exists a discrete family {Pjj > 1} of 
projectors such that any C G B can be written as C = J2 c jPj> then the condition holds. 

Proof We have to verify that.E„,L4|B] is the minimizer of E W [(A — C) 2 ] when C varies in 
B. The argument is the usual one, namely consider 

E W [[{A - E W [A\W\) + (E W [A\M] - C)f] 

expand the square, invoke the assumptions to get rid of the cross products and arrive at 

E W [(A - Cf] = E W [(A - E w [A\M]f] + E W [(E W [A\M] - C) 2 } 

which clearly achieves its minimum when C = E W [A\M]. □ 
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II. DOUBLE SLIT LIKE SCENARIOS 



Consider, with no further specification, a system with an underlying Hilbert space 7i of 
dimension larger than 2, and suppose it is prepared in an initial state W = \ip Q >< ip Q \ 
where \ip >= a\+ > +b\— > where a and b are complex numbers such that \a\ 2 + \b\ 2 = 1. 
So, instead of a wave coming from infinity and impinging on a two holed screen, we create 
two waves emitted from two different points, but we do not know from which. By |± > we 
denote respectively the wave (or particle) emitted from the upper, or respectively, the lower 
hole. Or think of an atom which may be in its ground state or in an excited state. Thus, 
the expected value of any observable A with respect to the density operator W (or in state 
w) is tr(WA) =< 4> \A\?p > . 

Let B + = | + >< +| and I?_ = |— >< — | be the orthogonal projectors and assume that 
|— > and |+ > are orthogonal, and let B the (commutative) algebra generated by B + and 



According to theorem (11.11) . the best predictor of an observable A given the measurement 
algebra B is 



We shall now consider some particular cases. Suppose that we consider |+ >= |x + >= 
|0,0,+1 > and |— >= |x_ >= |0, 0, — 1 >, respectively the eigenstates of the position 
operator corresponding to a particle localized at either of ±1 along the z-axis, and we let 
A = e~ ltH \yL >< x|e ltH . Here H stands for the Hamiltonian of a particle of unit mass in 
units in which Planck's constant H is 1. 

Note to begin with that if we use the notation A^(x;x') =< x'|e~ itH |x >, then the 
expected value of A in state w is w(A) = tr(WA) = \aK t {x.; |x + ) + bK t (x.; x_)| 2 . That is, if 
we do not observe which source is active, the probability of finding the particle at x satisfies 
the standard "wave-particle duality" implied by the superposition principle. 

If we decide to observe the position of the source, the result will be one of two possible 
values, and the predictor of A given B is given in (j2j). To compute it explicitly note that 



and a similar looking expression is obtained for w(B_AB^). Also, w(B + ) = \a\ 2 , and so on. 



B 




(2) 



w{B + AB + ) = tr(WB + AB + ) = tr(B+WB + AB + ) = |a^(x;x + )| 2 
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With all this, (J2j) is 

E W [A\B] = | J fcT t (x;x + )| 2 J B + + |^(x; x_)| 2 5_, (3) 

and keep in mind that his is an observable. When we observe the position to be (0, 0, +1), 
then the initial state is reduced to B + WB + /w(B + ) and the expected value of _E w [y4|B] is 
just |if t (x; 0, 0, 1)| 2 as the standard analysis on the double slit experiment asserts. Observe 
also that the expected value of -E„,[y4|B] in the state w is 

w(E w [A\M}) = tr(WE w [A\M]) = \a\ 2 \K t (x; x+)| 2 + |6| 2 |^(x; x_)| 2 

which is the classical expected value of the observed amplitude of particles emitted from 
z = ±1 with probabilities \a\ 2 and \b\ 2 respectively. 

Had we insisted in seeing waves at x, we should have considered A = E ^_ H |x >< x| E _ H - 
The analogue of ([3]) is 

£^[A|B] = 0(x-x + )B + + 0(x-x_) J B_, (4) 

where 

giu;|[x— x+|| 
6(x. - X i ) = n 77, 



X — X j 



and to = y/2E. 

Again, if no observation is made as to where the source of particles is located, the expected 
value of A in state w is 

w(A) = |a0(x - x + ) + b(p(x - x_)| 2 , 

whereas if we observe that the particle is at x_ then the observed signal at x is 0(x— x_), that 
is the expected value of i?^[A|B] in the state BJWB-fwiB-). Or if we compute the expected 
value of .^[AlB] in the state original state w, the result will be |a| 2 0(x — x+) + |6| 2 0(x — xJ) 
as classical physics would have predicted. 



III. PREDICTION IN THE PRESENCE OF CONSERVATION LAWS 

Suppose now that our system is a composite systems, with Hilbert space Ti = Hi <8> Ti.2, 
and let C be the selfadjoint operator on TC denoting some conserved quantity, and suppose, 
to keep it simple that there is no interaction between the subsystems, and that C = C\ + C% 
(or more properly, C = G\ ® 1% + I\ ® C2). We shall consider two variants of the same 
situation. 
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A. Schodinger's cat type paradoxes 



Suppose that the system is originally prepared in a state with density operate W = 
>< ipo\ where this ip >= a\l,0 > +b\0, 1 > where without fearing too much ambiguity, 
Ci|0 >= 0|0 > and d\l >= |1 > and similarly for C 2 . Think about 1 1, >< 1 , 1 as 
describing the projector on a state with an excited atom and no photon, whereas |0, 1 >< 
0, 1| describes projector on a state with the atom in its ground state plus a photon. In the 
second state the photon may be absorbed at a wall enclosing the system turning it blue (let 
the poor cat alone). If we look a the wall, and see it blue, the theory should predict that 
the atom has decayed. Again, the conditional expectation is not defined for all A e A, but 
the elements of the measurement algebra are easy to describe for the algebra is generated 
by two orthogonal projectors. 

If we do not look at the wall (or do not make any observation about the photon) the 
prediction of the value of any observable A in the state W is w(A) = tr(AW) =< -0 O j ^4|-0 O > • 
In particular, the probability of detecting a photon is w(h <8> |1 >< 1|) = tr(WI\ <8> |1 >< 
1|) = \b\ 2 which is the same as the probability of finding the atom in the ground state, 
namely w(\0 >< 0| <g> I 2 ) = tr(W\0 >< 0| <g> I 2 ) = \b\ 2 . The observation algebra when we 
are looking at the photon is the algebra generated by the operators B — I± <g) |0 >< 0| and 
Bi — Ii <S> |1 >< 1|. We do not consider more states of the photon number operator for the 
initial state that we prepared, there can be one photon at the most. 

Note now that if a photon is observed, the computation to predict the probability of 
finding a photon is (with W = B 1 WB 1 /w{WB 1 )) 

tr(E w [w(\0 >< 0| <g) I 2 | W [ B l W) = 1 

as it should be, for if one photon is observed, the atom can only be in the ground state. 

B. EPR type paradoxes 

Our setup will be similar to the previous section. The change in notation for emphasis. 
That is, we shall consider a composite systems, with Hilbert space 7i = Hi <E> Ti. 2 , and let L 
be the selfadjoint operator on Ti denoting some conserved quantity, and suppose, to keep it 
simple that there is no interaction between the subsystems, and suppose that L = L x + L 2 . 
Let us call L { the "spin" of the i-th particle and L the total spin. 
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As initial state we consider W = \ipo >< ^o\ where \ip D >= a\ + 1, — 1 > +b\ —1, +1 > and 
assume that Lj|± >= ±|± > for i = 1,2. and to avoid issues related to degeneracy, assume 
that these eigenvectors are non-degenerate. Thus L\ip Q >= 0\ip o > . The given superposition 
only reflects the possibility that there are two possible ways in which the composite system 
can have total value of L equal to 0. 

Again, we assume that the observation algebra is generated by the observation of the state 
of one the particles. For example, let it be the algebra B be generated by the projectors 
B_ = I x ® |1 >< 1| and B_ = | - 1 >< -1|. 

If we observe the second particle to be —1, with what probability is the spin of the 
first particle +1? The initial state reduces to W = B_WB_/w(B_), and the pending 
computation is 

[E4\l >< 1| ® I 2 | B] = tr{BJW\l >< 1| <8> B^)/tr(WB^) = 1 

that is, once we have observed the second particle to have spin —1, we can assert that with 
probability 1 the second the second particle has spin 1. We do not have to measure it for it 
is a certain event. 

IV. CONCLUDING REMARKS 

I hope to have convinced the reader that, as far as the paradoxes analyzed here goes, the 
whole mystery in quantum mechanics lies in the superposition principle, which in the exam- 
ples treated enters in the specification of the initial states. When the result of observation 
is taken into account by proper conditioning as in classical probability, the paradoxes are 
removed, but the mystery associated with the superposition principle is there to stay. 
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